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This paper presents ReVive, a novel general-purpose rollback recovery mechanism for 
shared-memory multiprocessors. ReVive carefully balances the conflicting requirements of 
availability, performance, and hardware cost. ReVive performs checkpointing, logging, 
and distributed parity protection, all memory-based. It enables recovery from a wide class 
of errors, including the permanent loss of an entire node. To maintain high performance, 
ReVive includes specialized hardware that performs frequent o ... 
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We develop an availability solution, called SafetyNet, that uses a unified, lightweight 
checkpoint/recovery mechanism to support multiple long-latency fault detection schemes. 
At an abstract level, SafetyNet logically maintains multiple, globally consistent 
checkpoints of the state of a shared memory multiprocessor (i.e., processors, memory, 
and coherence permissions), and it recovers to a pre-fault checkpoint of the system and 
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In this paper, we develop the first feasibly implementable scheme for end-to-end dynamic 
verification of multithreaded memory systems. For multithreaded (including 
multiprocessor) memory systems, end-to-end correctness is defined by its memory 
consistency model. One such consistency model is sequential consistency (SC), which 
specifies that all loads and stores appear to execute in a total order that respects program 
order for each thread. Our design, DVSC-Indirect, performs dynamic verification ... 
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An abstract model of rollback recovery control in distributed systems 
Jiannong Cao, K. C. Wang 
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Full text available: ^|pdf (1.3Q MB) Additional Information: full citation , abstract , index terms 

This paper develops an abstract model which presents a method of uniform description of 
different rollback recovery control algorithms for distributed systems. We first developed 
a general definition of the distributed rollback recovery control problem. The concept of a 
distributed recovery, control system (DRC system), consisting of distributed recovery 
control units (DRC units), is proposed to model recovery with various control 
granularities. Then, we developed a graph model, cal ... 
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